NSF PAR Search | NSF Public Access Repository

Comparing Deterministic and Soft Policy Gradients for Optimizing Gaussian Mixture Actors

Dey, Sheelabhadra; Sharon, Guni (December 2024, Transactions on Machine Learning Research)

Larochelle, Hugo; Murray, Naila; Kamath, Gautam; Shah, Nihar B (Ed.)

Gaussian Mixture Models (GMMs) have been recently proposed for approximating actors in actor-critic reinforcement learning algorithms. Such GMM-based actors are commonly optimized using stochastic policy gradients along with an entropy maximization objective. In contrast to previous work, we define and study deterministic policy gradients for optimiz- ing GMM-based actors. Similar to stochastic gradient approaches, our proposed method, denoted Gaussian Mixture Deterministic Policy Gradient (Gamid-PG), encourages policy entropy maximization. To this end, we define the GMM entropy gradient using Varia- tional Approximation of the KL-divergence between the GMM’s component Gaussians. We compare Gamid-PG with common stochastic policy gradient methods on benchmark dense- reward MuJoCo tasks and sparse-reward Fetch tasks. We observe that Gamid-PG outper- forms stochastic gradient-based methods in 3/6 MuJoCo tasks while performing similarly on the remaining 3 tasks. In the Fetch tasks, Gamid-PG outperforms single-actor determinis- tic gradient-based methods while performing worse than stochastic policy gradient methods. Consequently, we conclude that GMMs optimized using deterministic policy gradients (1) should be favorably considered over stochastic gradients in dense-reward continuous control tasks, and (2) improve upon single-actor deterministic gradients.

Full Text Available

Search for: All records